Error-Based Pruning of Decision Trees Grown on Very Large Data Sets Can Work!

نویسندگان

  • Lawrence O. Hall
  • Richard Collins
  • Kevin W. Bowyer
  • Robert E. Banfield
چکیده

It has been asserted that, using traditional pruning methods, growing decision trees with increasingly larger amounts of training data will result in larger tree sizes even when accuracy does not increase. With regard to error-based pruning, the experimental data used to illustrate this assertion have apparently been obtained using the default setting for pruning strength; in particular, using the default certainty factor of 25 in the C4.5 decision tree implementation. We show that, in general, an appropriate setting of the certainty factor for error-based pruning will cause decision tree size to plateau when accuracy is not increasing with more training data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Is Error-Based Pruning Redeemable?

Error based pruning can be used to prune a decision tree and it does not require the use of validation data. It is implemented in the widely used C4.5 decision tree software. It uses a parameter, the certainty factor, that affects the size of the pruned tree. Several researchers have compared error based pruning with other approaches, and have shown results that suggest that error based pruning...

متن کامل

Reduced - error pruning with signi cance testsEIBE

When building classiication models, it is common practice to prune them to counter spurious eeects of the training data: this often improves performance and reduces model size. \Reduced-error pruning" is a fast pruning procedure for decision trees that is known to produce small and accurate trees. Apart from the data from which the tree is grown, it uses an independent \pruning" set, and prunin...

متن کامل

مطالعات درخت تصمیم در برآورد ریسک ابتلا به سرطان سینه با استفاده از چند شکلی‌های تک نوکلوئیدی

Abstract Introduction:   Decision tree is the data mining tools to collect, accurate prediction and sift information from massive amounts of data that are used widely in the field of computational biology and bioinformatics. In bioinformatics can be predict on diseases, including breast cancer. The use of genomic data including single nucleotide polymorphisms is a very important ...

متن کامل

Appears in Ecml-98 as a Research Note a Longer Version Is Available as Ece Tr 98-3, Purdue University Pruning Decision Trees with Misclassiication Costs 1 Pruning Decision Trees

We describe an experimental study of pruning methods for decision tree classiiers when the goal is minimizing loss rather than error. In addition to two common methods for error minimization, CART's cost-complexity pruning and C4.5's error-based pruning, we study the extension of cost-complexity pruning to loss and one pruning variant based on the Laplace correction. We perform an empirical com...

متن کامل

Using Impurity and Depth for Decision Trees Pruning

Most pruning methods for decision trees minimize a classification error rate. In uncertain domains, some subtrees which do not lessen the error rate can be relevant to point out some populations of specific interest or to give a representation of a large data file. We propose here a new pruning method (called pruning) which takes into account the complexity of sub-trees and which is able to kee...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002